Deobfuscation of the code base + pep8 and fixes #481

hill-a · 2018-07-27T13:09:15Z

Fixed tf.session().__enter__() being used, rather than sess = tf.session() and passing the session to the objects
Fixed uneven scoping of TensorFlow Sessions throughout the code
Fixed rolling vecwrapper to handle observations that are not only grayscale images
Fixed deepq saving the environment when trying to save itself
Fixed ValueError: Cannot take the length of Shape with unknown rank. in acktr, when running run_atari.py script.
Fixed calling baselines sequentially no longer creates graph conflicts
Fixed mean on empty array warning with deepq
Fixed kfac eigen decomposition not cast to float64, when the parameter use_float64 is set to True
Fixed Dataset data loader, not correctly resetting id position if shuffling is disabled
Fixed EOFError when reading from connection in the worker in subproc_vec_env.py
Fixed behavior_clone weight loading and saving for GAIL
Avoid taking root square of negative number in trpo_mpi.py
Removed some duplicated code (a2cpolicy, trpo_mpi)
Removed unused, undocumented and crashing function reset_task in subproc_vec_env.py
Reformated code to PEP8 style
Documented all the codebase
Added atari tests
Added logger tests

Missing: tests for acktr continuous (+ HER, gail but they rely on mujoco...)

…lines into fixes_cleanup

…to fixes_cleanup

…lines into fixes_cleanup

This reverts commit 54e6079.

HTML Docs

pzhokhov · 2018-09-22T00:29:11Z

Hi @hill-a and @araffin! Any chance you guys are still interested in merging your changes? The amount of work that you have done on improving the code quality is incredible; and we'd love to cooperate instead of competing. If you are, let's reopen this and work on the merge conflicts / divergence points between refactors together. Putting @joschu, @miramurati and @jachiam as principal stakeholders in the loop on this as well.

araffin · 2018-09-22T13:53:50Z

Hello @pzhokhov,

We are glad you finally give some sign of life ;) (almost two months!)

With @hill-a, we would love those changes to be merged (even though the two codebases have a bit diverged, and it will need some work to fix the conflicts).
However, we would like some clarifications on OpenAI Baselines and the way it is managed, because we don't want to merge the changes and see the code broken again afterward.

One of my first question is a general one: how is OpenAI Baselines managed, what is the roadmap and what is the importance that Openai gives to Baselines ?

From what I understood, you are now the only one that is fully in charge of Baselines (as @unixpickle was some months ago), and the repo comes from different people for each algo (e.g. @joschu for TRPO, @matthiasplappert for HER, ...) which explains the code duplication and the different codestyles.

Another question is that there seems to be an internal Baselines repo that differs from the github version. What are the differences and why is it needed? (I think that is related but could you explain me some weird commits like this one (lots of commit text for almost no changes): e791565)
A bit more transparency would be appreciated, it would avoid regressions (e.g. #344 that was fixed here 4993286 but reintroduced here 9fa8e1b).

Finally, if we merged, we obviously would like to be credited in the README ;)

jachiam · 2018-09-22T14:21:34Z

Hi @araffin and @hill-a!

Chiming in to say hello, introduce myself, and to answer a few of the questions.

Intro: I'm Josh, and I'm a researcher on the safety team at OpenAI! Lately I've been thinking about getting more involved in making sure our public software releases are more user-friendly.

On how Baslines is managed: You're correct that @pzhokhov is the primary owner of Baselines, and that the algorithms were originally implemented by many different people. OpenAI is currently thinking about the long term state of this project (and others), and will hopefully be able to discuss a complete plan in the near future. Since @pzhokhov has been writing the roadmap for Baselines, I'll leave it to him to talk about that.

On the internal version of Baselines: recall that OpenAI is first and foremost a research organization. The internal repo changes rapidly to support research use cases, and includes code from in-progress research projects which aren't ready for release.

Regarding credit: of course we would gratefully acknowledge such contributions. :)

pzhokhov · 2018-09-24T20:32:16Z

Hi @araffin and @hill-a ! Thanks for maintaining your interest in the project!
Continuing Josh's idea of personal intros - I am Peter, ML engineer on OpenAI games team.
Regarding questions (simple first :))

Credit - or course
Internal version - motivation is outlined by Josh; I can add that we used to have an entirely separate private repo; and the export / import was a manual process, error-prone as you pointed out; and led to maintenance of baselines being a bit of an afterthought. Now our internal version is a git subrepo fork of baselines, which leads to a much tighter integration, and little chance of accidentally overwriting bug fixes, and keeps the git history correct.
Management - at the moment, I am in charge of baselines, but that's not to say I am the only one who writes code in it - there have been quite a few patches and bugfixes both from internal and external contributors recently. We are also thinking of augmenting baselines-responsible team with more people (@miramurati may be in a better position to comment on that). Your assessment of code duplication reasons is 100% accurate.
Roadmap... Step 1) fix existing algos; step 2) implement new algos; step 3) profit :)
Speaking seriously though, our roadmap is a bit of a fire-fighting one, and looks more like a loosely sorted laundry list of things (in the current form, it does not take your changes into account).

Docs + unit tests
Visualization framework - something to easily connect / compare runs; maybe as simple as helper utils for tensorboard (scope is a bit unclear)
Run and publish all benchmarks; need to confirm that it's ok legally to publish mujoco-based benchmarks, maybe switch to roboschool
Refactor DDPG, HER, GAIL to conform to common API
fix DQN hyperparams with Atari - find hyperparameters that work for all games
Recurrent model compatibility everywhere, or wherever possible (DQN, TRPO, ACER, ACKTR, HER, GAIL are not compatible right now)
Extra touches on ACER - use VecFrameStack instead of internal stacker, make compatible with observations other than stacks of images, unit tests
Extra touches on TRPO - compatibility with VecEnvs, remove env.reset(), Model wrapper
Extra touches on DQN - remove extra env.reset(), Model wrapper
Serialization syntactic sugar (easier ways of saving / loading the model; maybe also with the results of training)
More algorithms - Vanilla PG, Soft actor-critic, Soft Q-learning, IMPALA, etc

araffin · 2018-10-05T11:32:20Z

Hello @pzhokhov,

Thank you for your answer.
It could be a good idea to have a public roadmap (not just written in a closed pull request) with estimated dates if possible.

My main concern is still the internal repo. I understand perfectly the usefulness of a private repo for in-progress research. However, as for now, it impacts the public repo in an unpredictable way: commits messages coming from the subrepo do not reflect the changes and some unsused code (in the public repo) is also added. A solution for that would be to have a proper fork of baselines and do PR from time to time to update public repo.

pzhokhov · 2018-10-15T22:06:10Z

Public roadmap - fair point; we are working on finalizing it (should be done and published in a few days).
As for internal repo - I understand the concern; at the same time there is quite a bit of value of us in having a subrepo fork, as there are several internal repos that depend on baselines and need to be changed in accord. I believe a reasonable solution is push internal changes into a feature branch and make PR's into master - that way the changes are more transparent and have chance of external comments.

pzhokhov · 2018-10-31T19:05:36Z

Turns out the roadmap of baselines is a high enough visibility that bosses stepped in and wanted to polish it and make a press-release with it, so cannot publish it before that.

tomasruizt · 2018-11-23T09:06:21Z

Hi, @pzhokhov what does that mean in terms of timing? A month delay?

jrjbertram · 2018-11-26T01:15:50Z

Coming into this conversation from the outside... as a user of both pieces of software, I have found stable-baselines much more consistent. From a research side, it's great that you guys are publishing the algorithm implementations as it is a great resource to be able to have vetted implementations, especially when the authors of the original papers clarify subtle points in the openai implementations.

From a world-impact perspective, the changes that stable-baselines have made take openai baselines and make it actually useful by taking the "raw" algorithms and making them consistent, approachable, and reusable ... I would argue for openai to consider stable-baselines as the project that openai should contribute to. I'm only half joking... presumably openai's goal is to have gym and baselines eventually take on a life of its own... this seems like a perfect way to do it.

Both teams are doing great work... as a community member would love to find a way for you both to benefit from a strong relationship, as you each bring something valuable to the table!

Back to my corner...

J.

jonas-eschle · 2019-01-03T11:22:02Z

Commenting as a user as well:

what OpenAI has done is great: It went for the unification, providing common gyms, providing baselines. The idea and some implementations are great, no question. And don't let the following change that, I'm a huge fan of what you do/did, it is one of those cornerstones that really moves the community forward!

But: The whole repositories are not quite what they should be (talking of gym and baseline)

missing unittests
basically no (useful) docs (with sentences like "It’s very easy to..." where the actual explanation or an example should be... guys!)
coming from a put-together-approach (which may be a good thing at the beginning) instead of starting from a common ground (e.g. baseline implementations)
... which implies things like no common coding standards

stable-baseline has done exactly all that. Baseline is an experimental, you-have-to-figure-it-out-yourself repository while stable-baseline is the stable, user-friendly repository (that OpenAI may aimed for).

My opinion: gym should become more like stable-baseline and stable-baseline should become standalone. It is just so much better to use (yeah, the extra effort is not too much. But it's about the way of approaching the user).

Again: Great compliment for what you've done at OpenAI. But user-friendlyness and the last 20% of stability is what (for me, IMHO, as a user) seems missing and where stable-baseline seems to know what they do.

federicofontana · 2019-02-08T10:15:05Z

gym.Env and gym.Space have been trusted and widely used by many people and they are now an industry standard. Have you reconsidered taking the same direction with a gym.Agent? If so, merging openai/baselines with stable-baselines would be a huge step forward.

As of today:

openai/baselines are probably the most reliable collection of implementation of RL agents
there is not a single well established RL framework in pyhton. keras-rl is dead, tensorforce is dying. Maybe only ray/rllib might be your only "competitor".

Considering the excellent reputation of OpenAI, I wouldn't be suprised if openai/baselines because the standard RL framework one day. However, a necessary condition would be to improve the spaghetti code, or at least provide a standard interface to the users.

@araffin @pzhokhov

* Update DQN arguments: add double_q * Fix CI build * [ci skip] Update link * Replace pdf link with arxiv link

hill-a and others added 30 commits July 12, 2018 16:38

Merge branch 'fixes_cleanup' of https://github.com/hill-a/stable-base…

0fd3ff2

…lines into fixes_cleanup

partial documentation of gail

eb4cdb2

finished gail + fixed some argument calling in gail

ae3aaad

quick cleanup

1500a3b

Start cleaning up trpo_mpi

94cd63f

Merge branch 'fixes_cleanup' of github.com:hill-a/stable-baselines in…

72599ff

…to fixes_cleanup

Style fixes

873ebf7

Refactor ppo1 policies

78f3d95

Clean up a2c utils

b8863d8

Clean up A2C policies + utils again

8277eee

Continue clean up of a2c/acer/acktr/common

42be3e8

formatting & few comments on ppo2

256a459

Merge branch 'fixes_cleanup' of https://github.com/hill-a/stable-base…

6a8bbfa

…lines into fixes_cleanup

Clean up gail/ddpg

1fad26f

Merge branch 'fixes_cleanup' of https://github.com/hill-a/stable-base…

0fc0a10

…lines into fixes_cleanup

commenting ppo1/2

8a2a667

Update README.md

5410573

fixing few style issues

c5f4658

Typos

30f4264

Continue clean up of mainly ddpg/her/deepq

fdc9ca4

Various clean up

f47e311

Fix tests

4e89d92

Clean up HER + ACER + ...

d2e1dbd

Move test folder

576a770

Various cleaning

afb8ba6

Fix tests

e0253f6

issues fix in acer/

cf7765d

few fixes for issues in baselines/

f61a264

Various renaming

8828097

Fixes for ddpg

d58af64

araffin and others added 15 commits August 26, 2018 16:11

[ci skip] Remove HER from table

5c330e7

[ci skip] Fix table issue

54e6079

[ci skip] Revert "[ci skip] Fix table issue"

14bc8aa

This reverts commit 54e6079.

[ci skip] Add section name

93c2dca

[ci skip] Autodoc test

0626918

Move docstrings

3598140

[ci skip] Add ddpg and dqn to docs

d56cc77

[ci skip] Remove requirements.txt to build docs

147a809

[ci skip] Update doc: installation + getting started

302ae52

[ci skip] Add doc on vec env

37d5cd3

[ci skip] Add examples

2a565b1

[ci skip] Add missing algorithms + update README

06c9da5

[ci skip] Add changelog, remove READMEs

39e5ea2

Merge pull request #12 from hill-a/docs

bfe5f9e

HTML Docs

[ci skip] Bump version

3a57007

hill-a closed this Aug 29, 2018

jperl mentioned this pull request Oct 15, 2018

Improving code quality #400

Open

araffin mentioned this pull request Nov 26, 2018

Fork means unsearchable on github hill-a/stable-baselines#57

Closed

zhuyifengzju pushed a commit to zhuyifengzju/baselines that referenced this pull request Sep 29, 2019

Update DQN arguments: add double_q (openai#481)

4929e54

* Update DQN arguments: add double_q * Fix CI build * [ci skip] Update link * Replace pdf link with arxiv link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deobfuscation of the code base + pep8 and fixes #481

Deobfuscation of the code base + pep8 and fixes #481

hill-a commented Jul 27, 2018

pzhokhov commented Sep 22, 2018

araffin commented Sep 22, 2018

jachiam commented Sep 22, 2018 •

edited

Loading

pzhokhov commented Sep 24, 2018

araffin commented Oct 5, 2018

pzhokhov commented Oct 15, 2018

pzhokhov commented Oct 31, 2018

tomasruizt commented Nov 23, 2018

jrjbertram commented Nov 26, 2018

jonas-eschle commented Jan 3, 2019 •

edited

Loading

federicofontana commented Feb 8, 2019

Deobfuscation of the code base + pep8 and fixes #481

Deobfuscation of the code base + pep8 and fixes #481

Conversation

hill-a commented Jul 27, 2018

pzhokhov commented Sep 22, 2018

araffin commented Sep 22, 2018

jachiam commented Sep 22, 2018 • edited Loading

pzhokhov commented Sep 24, 2018

araffin commented Oct 5, 2018

pzhokhov commented Oct 15, 2018

pzhokhov commented Oct 31, 2018

tomasruizt commented Nov 23, 2018

jrjbertram commented Nov 26, 2018

jonas-eschle commented Jan 3, 2019 • edited Loading

federicofontana commented Feb 8, 2019

jachiam commented Sep 22, 2018 •

edited

Loading

jonas-eschle commented Jan 3, 2019 •

edited

Loading